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Abstract 

We study the effect of addition on the Hamming weight of a posi¬ 
tive integer. Consider the first 2"' positive integers, and fix an a among 
them. We show that if the binary representation of a consists of 0(n) 
blocks of zeros and ones, then addition by a causes a constant fraction of 
low Hamming weight integers to become high Hamming weight integers. 
This result has applications in complexity theory to the hardness of com¬ 
puting powering maps using bounded-depth arithmetic circuits over F 2 . 
Our result implies that powering by a composed of many blocks require 
exponential-size, bounded-depth arithmetic circuits over F 2 . 


1 Introduction 

We begin with a natural, but largely unstudied question: How does the Ham¬ 
ming weight of an integer (written in base 2) change under addition? To make 
this precise, we take a < 2"’ to be a fixed integer and let S be chosen uniformly 
at random from {1, 2, • • • ,2"’}. Write S in binary, and take X to be its Ham¬ 
ming weight. Let Y be the Hamming weight of the translation 5-1-0. Then 
what can we say about the joint distribution of initial and final weights, (X, Y)1 
Our question is motivated by the problem of determining the complexity 
of powering maps in F 2 n. This problem has been studied extensively in com¬ 
plexity theory E 1 13 01 uni in]- Recently, Kopparty [1] showed that the 
powering map x ^ from F 2 n, —> F 2 n cannot be computed with a polynomial- 
size, bounded-depth arithmetic circuit over F 2 (a.k.a AC°(©) circuit). Recall 
that arithmetic circuits are only allowed addition and multiplication gates of 
unbounded fan-in). A major advantage of working in AC°(©) is that it is basis 
invariant. That is, determining the AC°(©) complexity of powering does not 
depend on the choice of basis for A 2 n. At the core of Kopparty’s argument was 
the following shifting property of A a constant fraction of elements in Z 2 »i_i 
change from low to high Hamming weight under translation by 

Definition 1.1. Let M = {x & Z 2 "_i | wt{x) < j}, where wt{x) is the 
Hamming weight of x. We say that a G Z 2 n_i has the e-shifting property if 
MU (a-f M) > (i-be) 2^. 
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We say that any binary string in M is light, and any binary string not 
in M is heavy. Then a has the e-shifting property if translating 1 ^ 2 ^-i by a 
takes a constant fraction of light strings to heavy strings. Kopparty proved that 
powering by any a with the e-shifting property requires exponential circuit size 
in AC°(©) [T]. Our main result is that any a with many blocks of O’s and I’s 
in its binary representation has the e-shifting property, proving a conjecture of 
Kopparty. 

Theorem 1.2. Vc > 0, 3 e > 0, such that the following holds: Let a G {0,1}" be 
a bit-string of the form a = cricr2 • • ■ o’m, where m > cn, ai is either 0^’ or 1 ^', 
and each Li is chosen to he maximal. Let a G Z2n_i have base 2 representation 
given by a. Then a has the e-shifting property. 

Note that the theorem still applies even in the setting of integer addition, 
not just when doing addition mod 2 " — 1 . Our result states that a with 0 (n) 
blocks have the e-shifting property. It is not difficult to show that a with o{^/n) 
blocks do not have the e-shifting property. First, observe that o(-yn)-sparse 
(i.e. a with Hamming weight < a do not have the e-shifting property 

because addition by a can only increase the weight by o{y/n). Since there are 
0(-^) light binary strings of a fixed weight, we get 0(2") light strings changing 
to heavy strings under translation by a. 

Next, observe that any a with o{y/n) blocks can be written as a difference of 
two o(-v/n)-sparse strings: a = j 5 — ^. Since translating by a is equivalent to first 
translating by /3 and then by —7, we find that a with o{y/n) blocks does not have 
the e-shifting property. Thus, at least qualitatively, we see a strong connection 
between the e-shifting property and the number of blocks. Establishing a full 
characterization of the e-shifing property remains an interesting open question. 

1.1 Related Work 

Kopparty gave a different condition for when a has the e-shifting property: 
its binary representation consists mostly of a repeating constant-length string 
that is not all zeros or ones [T]. Note that any integer expressible as 
where a, &, g S Z, g > 1 is odd, and 0 < |a|, |6| < g, has binary representation 
of this form. As a consequence, taking g-th roots and computing g-th residue 
symbols cannot be done with polynomial-size AC°(©) circuits. Our main result 
generalizes Kopparty’s condition, as the periodic strings form a small subset of 
the strings with 0(n) blocks. 

Beck and Li showed that the g-th residue map is hard to compute in AC°(©) 
by using the concept of algebraic immunity [ 2 ]. It is worth noting that their 
method does not say anything about the complexity of the g-th root map in 
AC°(©). So in this regard, there is something to be gained by analyzing the 
e-shifting property condition. A more detailed history of the complexity of 
arithmetic operations using low-depth circuits can be found in [1]. 
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2 Application 

It is known that powering by sparse a has polynomial-size circuits in AC''(©). 
Kopparty’s work shows that powering by a with the e-shifting property require 
exponential-size circuits in AC°(©). We will use this result, along with our new 
generalized criterion for when a has the e-shifting property to expand the class 
of a whose powers are difficult to compute in AC°(©). 

The proof resembles the method of Razborov and Smolensky for showing 
that Majority is not in AC° (©) laaE]. We can show for a with the e-shifting 
property that if powering by a is computable by an AC°(©) circuit, then every 
function / : fjjn —>■ ¥2^ is well-approximated by the sum of a low-degree poly¬ 
nomial with a function that sits in a low dimensional space. The fact that there 
are not enough such functions provides the desired contradiction. In this way, 
we show certain powers require exponential-size circuits in AC°(©). 

As a consequence of Theorem lI. 2 l and the above Razborov-Smolensky method, 
we get that the powering by any a with 0(n) maximal uniform blocks requires 
an exponential-size AC°(©) circuit, thus greatly expanding the class of powers 
that are hard to compute in AC°(©). 

Theorem 2.1. Let a G Z 2 n_i have base 2 representation in the form given by 
Theorem \ 1 .^ 

Define A : F2n —>■ F2" by A{x) = x°‘. 

Then for every AC®(©) circuit C : F2n —^ F2n of depth d and size M < 2 ”^ , 
for sufficiently large n we have: 

Pr[C{x) = A(x)] < I - eo, 

where cq > 0 depends only on c and d. 


3 The Proof of the Main Result 

3.1 Outline of Proof 

Suppose we have a bit-string of length n. The bit-string is called light if its 
Hamming weight is at most The bit-string is called heavy otherwise. It 
is enough to show that translation by a in Z2 t*_i transforms some positive 
constant fraction of the light bit-strings into heavy bit-strings. 

We choose a binary string of length n uniformly at random, translate it by 
a, and look at the joint distribution of its initial weight X and final weight Y. 
Let (A, Y) = (X — E[A], Y — EjT]), so that when plotted, the plane is split into 
four quadrants. The fraction of strings that shift weight from light to heavy is 
the proportion of the distribution in the second quadrant. By symmetry, the 
same proportion of the distribution should lie in the fourth quadrant. We will 
prove that some constant fraction of the distribution lies in the second or fourth 
quadrant. 
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To get a handle on the distribution, we break up a into its m uniform blocks 
of O’s and I’s, and consider addition on each block separately. The distribution 
of the initial weight and final weight of any block is determined by the carry 
bit from the addition on the previous block and the carry bit going into the 
next block. Thus, if the carry bits are given, then the weight distributions on 
the blocks are now independent. Although we will not be able to specify the 
distribution of the carry bits, we will show that with probability g, the carry 
bits have a certain property, and whenever they have this property, then the 
conditional distribution of {X, Y) has a positive constant fraction of its mass in 
the second or fourth quadrants. 

3.2 Notation and Overview 

First, observe that it suffices to prove the main result for M as viewed as a 
subset of Z2n instead of Z2n_i. Note that only one element, 1 ” G Z2*», is not 
an element of Z2»»_i. Also, when translating by a, the resulting bit-string in 
Z2n-i is either the same or one more than the resulting bit-string in Z2". Since 
only o(n) of the heavy bit-strings of Z2" tranform into light bit-strings under 
translation by 1, if 0 (n) light bit-strings become heavy under translation by a 
in Z2»», then at least 0(n) — o{n) = 0(n) light bit-strings become heavy under 
translation by a in Z2n_i. This shows that we can work in the symmetric 
environment of all bit-strings of length n, Z2n, and still achieve the result we 
want. 

Let S G Z2n be chosen uniformly at random. Let T = a-tS”. Let X = wt{S) 
and Y = wt{T). 

Write a = 0102 • • • ctm, S = S'iS'2 • • • Sm, and T = T1T2 ■ ■ • Tm, where each 
of the t-th parts have length Li. Let Xi = wt{Si) and Yi = wt{Ti). Then 

( m m \ 

^ Ai, ^ Ti I. Let (X,Y) = (A-E[A], y-E[y]). Then the part of 

i=i _/ 

the distribution of (A, T) in the second quadrant corresponds to light bit-strings 
translating to heavy bit-strings. Similarly, the fourth quadrant corresponds to 
heavy to light bit-string translation. To avoid having to pass to analogously 
defined (A^, Yi) all the time, any reference to the second or fourth quadrant will 
be understood to be relative to the mean of (Aj, Yi). We want to show 

that a positive constant fraction of the distribution lies in the second or fourth 
quadrants. 

C m m \ 

E E Yi ] are highly dependent. 

i=l i=l ) 

To get around this, we will condition on the fixing of the carry bits. Once the 
carry bits are fixed, the terms in the sum are independent. We will show that 
with probability at least we can find 0(n) terms with identical distribution. 
Since the terms are independent, we will use the multidimensional Central Limit 
Theorem to prove these identical distributions sum to a Gaussian distribution 
with dimensions of size 0(i/n). 

The remaining 0 (n) terms can be divided into two categories. Either the 
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term has non-zero covariance matrix or it is a translation along the line y = 
—X relative to the mean, kf). By applying the 2 -dimensional Chebyshev 
Inequality to the terms with non-zero covariance matrix, we show that at least 
half of the distribution lies in a square with dimensions 0 (y/n). Any Gaussian 
with dimensions 0(i/n) centered in the square of dimensions 0 {^/n) will have 
a fixed positive proportion p of its distribution in the second quadrant and p of 
its distribution in the fourth quadrant. Finally, a translation of any magnitude 
along the line y = —x still gives at least p of the distribution in the second or 
fourth quadrant (although we don’t know which one!). However, as the addition 
map is a bijection from Z2»i to itself, we get that the number of strings that go 
from light to heavy equals the nubmer of strings that go from heavy to light. 
So we conclude that at least p of the distribution lies in the second quadrant 
and at least p of the distribution lies in the fourth quadrant. 


3.3 Computing the Distribution 

We first compute the 2 -dimensional distribution of the initial and final weights 
of the i-th block conditioned on the carry bit from the {i + l)-th block. If the 
carry bit from the i-th block is denoted by c*, then we want to understand the 
distribution of {Xi,Yi) given the carry bit Ci+i. Suppose that at = The 
case where ai = 0^* is similar. 

Lemma 3 . 1 . Suppose that ai = The joint distribution of {Xi,Yi) condi¬ 
tioned on the carry bit c^+i is given by: 


Pi{x,y I a+i = I) 


(to */ 2 ; = y{then a = 1) 
0 else 


Pi{x,y I Ci+i = 0 ) 



Li-y+x-2\ 


if {x, y) = ( 0 , Li){then Ct = 0 ) 

if Li — l>y>x— 1 > 0 {then Ci = I) 


If Ci+i = I, then Xi = Yi and Ci = 1 . Hence, the probability mass function 
for {Xi,Y^) given c^+i = 1 is given by 


Pt{x,y I c*+i = I) 


' _i_(Li) 

2^i \ X / 


if X = y 
else 


If Ci+i = 0 , then the distribution of {Xi,Yi) depends solely on the number 
of trailing zeros, Zi, of S'*: 


r A, if A,+1 - y,+i = L,+1 

Y, = lx,+ Z,-1 if Z,<L, 

Li if Zi = Li 

We therefore first compute the distribution of Zi conditioned on Xi and use 
that to compute the joint distribution of {Xi, Yi). The distribution of Zi \ Xi is 
given by 
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PZi{z I x) 


if {x,z) = (0,Li) 
if Li — z > X 
else 



Since Pi{x,y \ Ci+i) = PXiix)pYiiy \ x), we compute pxiix) and py^iy \ x). 
As Xi is binomial on Li trials with success probability i, 




for X = 0,1, • • • , Li. 


We can also write the distribution of Yi \ Xi in terms of the distribution of 

I X,;: 


PYiiVi \Xi) = 


\ PZiiVi - Xi + 1 \ Xi) ii 0 < yi - Xi + 1 < Li 

[PZiiVi - Xi \ Xi) ii {xi,yi) = { 0 ,Li) 

Hence, we have the joint distribution of {Xi,Yi) is 


Pi{x,y I Ci+i = 0 ) 


M 


Li-y-\-x-2\ 
x-1 ) 


if (x,y) = (0,Li)(then c^ = 0) 

ifii — l>?/>x — 1> 0(then c* = 1) 


Similarly, if = 0 '^% then the distribution of [Xi, Yi) is as follows: 
If Ci+i = 0 , then Xi = Yi, Ci = 0 and 


Pi{x,y I Ci+i = 0 ) 


2^ i^j) if a; = 2/(then a = 0) 
0 else 


When the carry bit makes the addition trivial, we call the resulting distribu¬ 
tion the trivial distribution. Otherwise, the carry bit Ci+i = 1 . In this case, the 
distribution of {Xi, Yi) turns out to be symmetric with the case where ai = 1 ^' 
and Ci+i = 0: 


( I _ 1 _ I 2 ^ ^Hx,y) = {L„ 0)(then c* = 1) 

p,{x, y I c,+i - j - I ^ (^^7+r") ifi^-l>^>2/-l> 0(then c* = 0) 

When the carry bit makes the addition nontrivial, as in this case, we call 
the resulting distribution the nontrivial distribution. 

Fixing the carry bits leads to four types of distributions for the blocks based 
on the carry bit coming in from the previous block addition and the resulting 
carry bit from the current block addition. 

1 . The block distribution is trivial and produces a carry bit that makes the 
subsequent block distribution non-trivial (Trivial to non-trivial). 

2 . Non-trivial to trivial 
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3 . Non-trivial to non-trivial (block length L = 1 ) 

4 . Non-trivial to non-trivial (block length L > 2 ) 

We make the distinction between block lengths 1 and 2 for non-trivial to 
non-trivial distributions as the latter is the only distribution with invertible 
covariance matrix. Ideally, we will find many identical distributions of type 4 , 
which will sum to a Gaussian with large enough dimensions. This will not be 
possible when most of the blocks have length 1, which we deal with separately. 

Knowing the weight distribution of a block given the previous carry, it is 
straightforward to write down the distributions given both the previous carry 
and the produced carry. Again, we assume the block Oj = 

As a trivial distribution always produces a non-trivial carry, we get the trivial 
to non-trivial distribution is the same is the trivial distribution: 

t I 1 = y 

Pi{x, y \ Cj+I = 1 , Ci = 1 ) = < ^ * ' ® ^ 

A non-trivial distribution that produces a trivial carry must have (A^, Yi) = 
{0,Li). Also, a non-trivial distribution of block length 1 that produces a non¬ 
trivial carry must have {Xi,Yi) = (1,0). 

Finally, a non-trivial distribution of block length greater than 1 that pro¬ 
duces a non-trivial carry has distribution: 


Pt{x,y I Ci+i = 0 ,Ci = 1 ) 



(Li-y+x-2\ 


if T* 
else 


1>0 


We summarize these distributions in the next lemma: 

Lemma 3.2. Suppose that at = The joint distribution of {Xi,Yi) condi¬ 
tioned on the carry hits Ci+i and Ci is given by: 


Pt{x,y I Ci+i = l,Cj = 1 ) 


^f^ = y 

0 else 


Pi{x,y I Ci+i =0,Ci = 0 ) 


1 If ix,y) = {0,L^) 
0 else 


Pi{x,y I Ci+i =0,Ci = 1 ) 



Li-y+x-2\ 
x-1 ) 


if 

else 


l>y>x-l >0 


Observe that the last non-trivial to non-trivial probability distribution works 
for all lengths Li > 1 . However, when Li = 1 , {x,y) = ( 0 , 1 ) with probability 
1 . We will still consider this as a separate type of distribution as its covariance 
matrix is all zeros, and consequently not invertible, which will be important for 
analysis. 
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3.4 Computing the Covariance Matrix 

Lemma 3 . 3 . The covariance matrix M of the trivial to non-trivial distribution 
of the random vector {Xi, Yi) is given by 

fhi. k± 

^=(l a 

\ 4 4 

The covariance matrix M of the non-trivial to non-trivial distribution of the 
random vector {Xi,Yi) is given by 


M = 


where c = ^ (l + " ¥ (^ + 2^) 2^’ 

and d = ^ ^l]_^ + ¥ (1 + 2A-1) 2A-1 - 1- 


Proof. To simplify our notation, let {X{L),Y{L)) denote some {Xi,Yi) with 
Li = L. We begin with the trivial to non-trivial distribution. Since X{L) 
is binomial on L trials with success probability Var[X{L)) = j. Since 
the bit string corresponding to Y(L) can be viewed as a translation of the bit 
string corresponding to X{L) in Z2r, the distribution of Y{L) is the same as 
the distribution of X{L). Hence, Var{Y{L)) = j. It remains to compute 
Cov{X{L),Y{L)). In the case of the trivial distribution, X{L) = Y{L). So 
CoviX{L),Y{L)) = Var{X{L)) = f 

The case of the non-trivial to non-trivial distribution requires more work. 
For our computation, we assume As the nontrivial distributions are 

symmetric in x and y, the covariances will be the same. We begin by evaluating 
Var{X{L)) = 1 E[A(L) 2 ] - E[X{L)]^. 

As the block with weight X{L) is chosen uniformly at random among all 
non-zero strings of length L, 


nx(L)] = + 


L 

We use repeated differentiation of the binomial theorem to compute 

n—1 



Differentiating with respect to x yields: 




E L 


f jna;" = L{x +1)^ ^x. 


Differentiating a second time with respect to x gives: 

= L{L — l)(x + l)^“^x + L{x + 1)^ 

n—1 ^ 

Plugging in X = 1 gives us the sum we want: 


= L{L - 1)2^-'^ + L2^-^ 


= L2^ 


L-1 1 

^ + 2 


L{L + 1) 
4 


Hence, the variance of X{L) is given by: 


Var{X{L)) = - E 1 + _J_ 

4 2^-1 4 V 2^ - 1 


L'^ + L f 1 \ L2 / 1 \ l2 / 1 \ 1 

4 2^ - 1 / ~ T 2^ - 1 / ~ T \ 2^ - 1 / 2^-1 


Observe that Y is the weight of a block of length L chosen uniformly at 
random from all strings except 1^. So by symmetry, Var(Y) = Var(X). We 
now compute CoviX,Y) = E[X{L),Y{L)] - E[X{L)]E[Y{L)]. 


E[XiL)Y{L)] = ^ Y1 


L — X + y — 2 


2L_1 ^ y_l 

l<y<x+l<L ^ ^ 

1 ^WL-x + y-2 

^r^iE^Eyl^ y_l 

x^O y=l ^ ^ 


7E^E(2/ + i) 


af=0 y—0 


L — X + y — 1 
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Let A{x) = {y + 1) 

y=0 


L — X + y — 1 

y 


be the inner summation. Then by 


repeated application of the hockey stick identity, 


Mx) = ^ (x + 1 ) 

y=o 


(x + 1) 

(x + 1) 
(x + 1) 


L — X + y — 1 

y 

x-l V 


X — 1 

-'^{x-y) 

y=0 


L — X + y — 1 

y 


-EE 

y=o j=o 
X — 1 

-E 

v=0 


L — X + j — 1 

j 


' L — X + y' 

y 

L 

yX — ly 

Substituting A(x) back into our expression for E[X{L)Y{L)] yields 

TL-l /r\ L-l 


E[X{L)YiL)] = 


1 


2 ^ - 1 


+ )-E^ 

L 

x — 1 


L 

x — 1 


. We can simplify B 


X—\ ^L\ V—\ 

Let B = ^ x{x + 1) ( ) and let C = ^ c 

x—1 ^ x—1 

and C by starting with the binomial theorem and applying standard generating 
function methods. 


A :=0 ^ ^ 

Differentiating both sides with respect to x gives: 

E (^ + 1) f = (1 + 2^)^ ^(l + (I/ + l)x). (1) 

/c =0 ^ 

Substituting x = 1 in equation [T] gives 

= 

/c =0 ^ ^ 

c + l(^^i)+(l + i)(;;) = 

c = 


(l + x)^-i(L + 2) 

2 ^-^(L + 2 ) 

2^-1(L + 2)-(L2+L + 1). 
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To get B, we differentiate equation [T] with respect to x once more 

k{k + 1) = -^(1 + + (T + l)x). (2) 

Substituting a; = 1 in equation [2] gives 

+ = L2^ ^(I/ + 3) 

k=l ^ ^ 

B + L(L + 1)(^^^ = 2^-2l(L + 3) 

B = 2^-2L(L + 3) - (L^+ L). 

Using the simplified expressions for B and C, we get 


E[X(L)y(L)] = 


1 


2 ^ - 1 

1 


(B-C) 

{2^-^L{L + 3) - (L^ + L)- 2^-\L + 2) + {L^ + L+ 1)) 


2 ^ - 1 

2^ /L(L + 1) L(L + 1) L + 2 
2^ - 1 4 2 ^ ^ 

1 \ /L2 + L_4 1 

^ 2^ - 1 / I 4 ^ 


= 1 + 


1 


2 ^ - 1 


4 V 2^ 


L^ + L 


1 + 


1 


2 ^ - 1 


- 1 


2 ^ 


Hence the covariance of X(L) and Y(L) is 


Cov(X(L),Y{L)) = 


+ 1 


2^-1) ^ 2 2^ -1 I 2 


' ^^( 1 - 


2 ^ - 1 




L 2 / 1 

2^-1 


L 2 / 1 

t(^+2^ 


L / 1 

2 ^ - 1 


1 


1 


— 1 
4 r 2^ - 1 y 2-^ - 1 


- 1 . 


□ 
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3.5 Breaking up the Weight Distribution 

Recall that the joint distribution of initial and final weights is the sum of the joint 
distribution of initial and final weights for m = 0(n) smaller parts: {X,Y) = 


m 



However, the terms of this sum are dependent. We can remove 


the dependence by first sampling the carry bits according to their distribution. 
Given the carry bits, all of the terms in the sum are independent and have one 
of four types of distributions given by Lemma 13.21 This gives us access to the 
Central Limit Theorem and the fact that covariance matrices add, both of which 
will be used in the proof. 

We will break up {X,Y) into a sum of a Gaussian {Xc,Ya), a translation 
{Xt, Yt) and some remainder {Xu, Yr), which we show is well-behaved. As the 
non-trivial to non-trivial distribution with block length at least 2 (Type 4) is 
the only type with invertible covariance matrix, our goal will be to find many 
identical distributions of this type. By the Central Limit Theorem, these sum 
to a 2-D Gaussian {Xc,Yg) of dimensions 0(i/n). This will be the main part 
of the sum that pushes the distribution into the second and fourth quadrants. 

It is not always possible to find many identical distributions of type 4. If 
there are o{n) blocks of length at least 2, then it is trivially impossible. We 
deal with this case separately with a slightly modified argument. Otherwise, 
there are 0(n) blocks of length at least 2. We will show that with probability at 
least g, the carry bits arrange themselves in such a way so that there are 0(n) 
distributions of type 4. This is enough to find many identical distributions of 
type 4. 

We then consider the sum of the remainder of the type 4 distributions along 
with the trivial to non-trivial type I distributions, {Xr, Yr), and show that it is 
well-behaved. As the covariance matrices add, we will be able to apply the 2-D 
Chebyshev inequality to guarantee that half of the distribution lies inside an 
ellipse of dimensions This will be enough to guarantee some constant 

proportion p of the distribution of {Xq, Yg) + {Xu, Yr) in the second quadrant, 
and the same proportion p in the fourth quadrant. The rest of the distribution 
of {X,Y) is a translation {Xt,Yt) along the line y = —x relative to the mean. 
After translation, we still have p-fraction of the distribution in either the second 
or fourth quadrant. 

We first consider the case where there are m' = 0(n) blocks of length at 
least 2. The following lemma says that with probability at least we get many 
distributions of type 4. 

Lemma 3.4. Suppose there are m' = 0{n) blocks of length at least 2. Let X be 
the number of non-trivial to non-trivial distributions with block length at least 


2. Then: 
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Proof. 


E[X] = ^^P(Block i is non-trivial to non-trivial) 

i=l 

m' 

= P(Non-trivial carry out | non-trivial carry in) • P(Non-trivial carry in 


i=l 


> 


^3 1 

2^ A ' 9 


3 , 

-TO . 


Let Y be the number of blocks of length at least 2 that are not non-trivial 
to non-trivial. Then E[F] < |to'. By Markov’s inequality, 


Taking t 


IP(>" >t)< 


E[r] 


5 

< - 


I to' yields 


p(y > -to') < - 

4 6 

p(r < -to') > - 

^ “ 4 ' “ 6 

P(X > -to') > -. 

^ “ 4 ' “ 6 


□ 

We now show that many distributions of type 4 implies many identical dis¬ 
tributions of type 4. 

Lemma 3.5. Suppose we have m = 0(n) bit-strings of total length at most n. 
Then there is some fixed positive length L such that I = 0(n) bit-stings have 
length L. 

Proof Let Uk be the number of blocks of length k. Then we have the following 
equations about the total number of blocks and the total length of all the blocks: 


n 

= m 

k^l 

n 

k ■ Uk < n. 

fe=i 

Dividing the above equations by to yields: 
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= 1 

n 

< 

m 

Now consider the random variable A that returns the length of a block chosen 
uniformly at random. The left hand side of the second equation is the expected 
value of A: 


n 





E[A] < —. 

m 

Since m = Q{n), ^ is bounded below by a constant. More precisely, there 
is a constant ci > 0 such that for large enough n, m> cin. So: 


E[v4] < — < —. 
m Cl 

By Markov’s Inequality combined with the upper bound on the expected 
length, we have: 


P{A > —} < 

Cl 


E[y4] 

k 

Cl 



This means that at most ^ blocks have length greater than So there 
must be at least (l — m non-trivial blocks of length at most By the 
Pidgeonhole Principle, there is some length which is at most that appears 

“ i) '^1"^ times. 

Cl 

Essentially, some non-trivial block of short length must appear very often. 
We should pick the value of k that maximizes the frequency of this length: 
k = 2. We get that some non-trivial block of short length appears at least 
times. Taking c = ^, then we get that the number of non-trivial distributions 
I > cm, and this constant c is independent of the assignment of carry bits. 

□ 


So with probability at least g, we can find I = 0(n) identical type 4 distri¬ 
butions. Since these identical distributions are independent of each other, the 
Central Limit Theorem together with Lemma 13.31 tells us that the distribution 
of their sum is a Gaussian with covariance matrix given by 


where c=^(l + “ X (^ + 2 ^) 

d = j ^1 -b “T 2^--! ) 2^-1 ~ L >2. 
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Lemma 3.6. Suppose G is a 2-dimensional Gaussian distribution with covari¬ 
ance matrix given by Mq ■ Then a fixed positive proportion of the distribution 
of G lies inside (and outside) an ellipse centered at the mean with dimensions 
Furthermore, the probability density function fa{x,y) > -^e in¬ 
side a circle of radius ‘i^/n centered at the mean of G. 

Proof. Observe first that c — d = ^ ^ 2^-1 — k (i > ^ for 

any value of L > 2. As det(MG) = {cf — df)l‘^ ^ 0, Mq is invertible. So letting 
G = {Xc,Ya) denote the distribution obtained by translating G to its mean, 
we get that a fixed proportion of the distribution lies in the ellipse defined by: 




= 2 


The inverse of Mq is given by: 


Mg' = 


cl —dl 


(c2 _ y-dl cl 


Substituting Mq^ back into the equation of the ellipse gives: 


- 2dXGYG + cYg^] 

X^^ -—X^ + Yg^ 
c 

We have an equation of the form — 2axy + 7/^ = 5, where a = | < 
1. This describes an ellipse rotated by j counterclockwise. By rotating the 
ellipse clockwise by f, we can find the dimensions of the ellipse. Making the 
substitution: 

= f + 

\yj [^(p'-x')J’ 

we get that the equation of the rotated ellipse is 


1+a 1—a 

Taking a = ^ and b = as in the ellipse for our Gaussian, we find 

that the squares of the dimensions of the ellipse are given by: 



b 

1 — a 
h 

1 “h ft 


c c — d 

C 

c c-\- d 


2 

2(c -|- d^l ^ —I 

o 

2{c-d)l > h. 
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Hence, both dimensions of the ellipse are Q{^/n). In fact, both dimensions 
exceed So the circle of radius ^\/l centered at the mean lies completely 

inside the ellipse. Scaling every dimension up by a factor of tells us that 

inside the circle of radius 4-y/n centered at the mean, we have: 


{Xg 


^ 144n 


Therefore, we have the following lower bound on the probability distribution 
function inside the circle of radius Ay/n: 


fG{x,y) > 


> 

> 


1 


2^T^/detMG 

1 

2Try/{cF^^dFjP 

1 144n 


2ttcI 

1 144n 

—e ‘ 

Tin 


□ 

The above sequence of lemmas can be used to show that if we start with 
many blocks of length at least 2, then with positive constant probability, we can 
find many identical distributions that sum to a Gaussian of dimensions Q{-s/n). 
Suppose now that the exponent a has a total of m blocks, but fewer than O.Olm 
blocks of length at least 2. Then at least 0.99 fraction of the blocks have length 

1. Consider all consecutive block pairs. At most 0.01 fraction of these pairs 
have their first block with length 2, and at most 0.01 fraction have their second 
block with length at least 2. So at most 0.02 fraction have a block of length at 
least 2. Hence, 0.98 fraction of the pairs consists of two blocks of length 1. By 
the Pidgeonhole Principle, at least 0.49 fraction of the pairs are either all 01 
or all 10. Without loss of generality, assume that 0.49 fraction of consecutive 
block pairs are 01. We now treat each block pair 01 as a single block of length 

2. The initial and final weight distribution of this larger block, given there is 
no carry in and no carry out matches the type 4 distribution. We have proven 
the existence of a large number of modified blocks of length 2: 

Lemma 3.7. Suppose there are fewer than 0.01 fraction of the blocks have 
length at least 2. Then at least 0.49 fraction of consecutive block pairs are 01 or 
at least 0.49 fraction of consecutive block pairs are 10. 

Lemma lTTl essentially reduces the case of having few blocks of length at least 
2 to the case where there are many blocks of length at least 2 by consolidating 
many of the length 1 blocks. As there are 0(n) such consolidated blocks, and 
each has type 4 distribution with probability at least |, we can again find 0(n) 
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identical type 4 distributions by Lemma [3.41 Lemma [3.61 then says that these 
identical distributions sum to a Gaussian of large dimensions. So for any a, we 
can find many terms in the initial and final weight distribution summing to a 
large Gaussian. 


3.6 Distribution of the Sum of the Remaining Terms 

m 

Consider the terms remaining in the distribution of (X,Y) = E {Xi,Yi) when 

i=l 

the terms contributing to the Gaussian are removed. The terms with distribu¬ 
tion types 2 or 3 are translations in the < 1,-1 > direction relative to (■§, ■§)• 
These will contribute to the translation part of the distribution {Xt,Yt). The 
rest of the terms of type 4 along with the terms of type 1 sum to the remainder 
R = {Xu, Yu). There are 0{n) terms remaining. By Lemma 13.31 the covariance 
matrix of each of these terms is one of the following two forms: 



L 

4 

L 

4 


d\ 


\d cj 

where c = | (l -f ^ (l + 2 ^) and d = ^ (l -b + 



As the terms are independent given a fixing of the carry bits, the covariance 
matrices add. The total covariance matrix of the sum is: 


where D < C < ^, with D = C only when the remainder is a sum of type 1 
distributions. 

Lemma 3.8. At least half of the distribution of the remainder lies in a circle 
of radius \/^ centered at the mean. 

Proof. When the remainder is a sum of type 1 distributions, then the remainder 
has the form {Xn, Xr), where Xr is a biniomial distribution with Lr < n trials 
and success probability So by Chebyshev’s Inequality, 
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So in this case, at least half of the distribution of the remainder lies in a 
circle of radius When the remainder contains some type 4 distributions, 
then D < C < ^. Hence, the covariance matrix Mji is invertible. So we may 
apply the 2-dimensional Chebyshev inequality to {Xn^Yji) to get: 

Taking t = 2 yields 


Pr{Xn 




Pr{XR -\- 2 —XrYr YYr < 


frac2{C'^ - D^C} > 


1 

2 

1 

2 ‘ 


Chebyshev tells us that at least half of the distribution lies in the ellipse 
centered at the origin defined by the inequality above. By a similar computation 
as with the Gaussian distribution, the squares of the dimensions of this ellipse 
are 2{C+D) and 2{C—D), both of which are less than Hence, the ellipse lies 
inside a circle of radius 2 < \/^, and therefore over half of the distribution 

of the remainder must lie inside this circle. □ 


3.7 The Proof 

We are ready to prove the main theorem. 

Proof. Theorem 11.21 Suppose that a has m > cn blocks in its binary represen¬ 
tation. Then either there are O.Olm. blocks of length at least 2 or there are 
fewer than O.Olm blocks of length at least 2. In the second case, Lemma [3.71 
tells us that we can find 0.49m identical pairs of consecutive blocks of length 
1. Lemma 13.41 then says that with probability at least g, the carry bits arrange 
themselves in such a way that there are at least 11^m > ^m identical type 4 
distributions. 

If there are m! > j^m blocks of length at least 2, then Lemma 13.41 says 
that with probability at least i, the carry bits arrange themselves in such a way 
that there are at least jm' > ^m type 4 distributions. Since ^ we 

conclude that for any a with m blocks, we can find ^m type 4 distributions 
with probability g. 

As m > cu, the number of identical type 4 distributions exceeds ^m > 
■^n. By Lemma [3.51 we can find I > • ^m > identical type 4 

distributions each with block length L. By Lemma [T6l these sum to a Gaussian 
whose probability distribution function fa{x,y) > -^e inside a circle of 
radius ‘^^/n centered at the mean of G. As each type 4 distribution has mean 
± 1), we decompose the Gaussian into a Gaussian G = 
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(Xg, Yg) centered at and a translation in the (1,-1) direction which 

contributes to the translation term {Xt,Yt). 

It is worth noting that every distribution type of block length L can be 
decomposed into the sum of a distribution centered at (■§■,■§■) and a translation 
in the (1,-1) direction. To see this, we will write the mean of each type of 
distribution as (■§,■§) + fc(l, — 1), for some k depending on L. 

Type 1 distributions have mean Type 2 distributions have mean 

(■j,-j) ±-1(1, —1). Type 3 distributions have mean (-j,-1) ±-1(1,—1). Typed 
distributions have mean (-j, -j) ± ■§ ■ — !)• 

We extract the translation component from each term and call that sum 
{Xt, Yt). Let R = {Xr, Yr) = {X, Y) — {Xq^ Yg) — (Xt, Yt) be the remainder. 
If Lt denotes the total length of all blocks contributing to R, we have that 
■^) is the mean of R. Let R = R — (■^, -^). By Lemma l3^ at least 
half of the distribution of R lies in an circle centered at the origin with radius 
•\/^. By taking the square W of side length surrounding the circle, we 

see that at least half of the distribution of R lies in W. 

For any point (p, q) in the square, consider the distribution of the Gaussian 
G + (p, q) = {Xg +P,YG + q), which is centered at (p, q). Lemma [?751 guarantees 
that the probability distribution function exceeds ~ inside a circle of 

radius A^/n centered at (p, q). Contained within this circle is a square with side 
length in the second quadrant. Hence, the probability that G + {p,q) lies 

in the second quadrant is at least ~. 

2 

Recall that I > where m> an. So we have: 

2 144-640000 

—e 

TT 

2 _ 92160000 

—e . 

TT 

Q _9:^ibuuuu -1 

Take C = -^e . Then with probability at least C fraction of the 

distribution of G + i? conditioned on the carry bits lies in the second quadrant, 
where G is a constant depending only on c. By symmetry, the same fraction 
lies in the fourth quadrant. Finally, we must add the translation {Xt,Yt) in 
the (1, —1) direction. No matter the size of the translation, we are guaranteed 
G fraction in either the second or fourth quadrant. Hence, we have at least ^ 
of the unconditioned distribution of {X,Y) = {Xg,Yg) + {Xr,Yr) + {Xt,Yt) 
lying in the second or fourth quadrants relative to the mean (§,§), and so at 
least yI lying in either the second or fourth quadrant. By symmetry, we get at 
least Y 5 lying in both the second and fourth quadrants. □ 



4 Heavily Shifting Numbers 

We have shown that a with many uniform blocks of O’s and I’s have the shifting 
property. An interesting related question is whether there is an a that is heavily 
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shifting: that is, a shifts almost all of the light strings to heavy strings. More 
precisely, o(l) fraction of light strings remain light under translation by a. We 
already know that when a has o{y/n) blocks, a does not have the e-shifting 
property, and can therefore not be heavily shifting. 

Our current understanding of the joint initial and final weight distribution 
cannot quite show that a with 0(n) blocks are also not heavily shifting. The 
reason is that we have no handle on the size of the translation term (Xt, Vt) in 
the (1,-1) direction. It is possible that the translation is so large most of the 
time to make a heavily shifting, though we suspect this does not happen. 

It is an open problem to figure out which a are heavily shifting. 
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